Unsupervised and domain-independent extraction of technical terms from scientific articles in digital libraries
نویسندگان
چکیده
A central issue for making the contents of documents in a digital library accessible to the user is the identification and extraction of technical terms. We propose a method to solve this task in an unsupervised, domain-independent way: We use a nominal group chunker to extract term candidates and select the technical terms from these candidates based on string frequencies retrieved using the MSN search engine.
منابع مشابه
Unsupervised Metadata Extraction in Scientific Digital Libraries Using A-Priori Domain-Specific Knowledge
Information extraction from unstructured sources is a crucial step in the semantic annotation of content. The challenge is in supporting an high quality automatic approach (or at least semi-automatic) in order to sustain the scalability of the semantic-enabled services of the future. Unsupervised information extraction encompasses a number of underlying research problems, such as natural langua...
متن کاملA New Domain Independent Keyphrase Extraction System
In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respectiv...
متن کاملA Systematic Review of Data Mining Applications in Digital Libraries
Purpose: Study aimed to identify the applications of data mining in the provision of services, collection and management of digital libraries. Methodology: This is an applied study in terms of purpose and in terms of method is qualitative research that have been done by systematic review method. For this purpose, articles have been obtained by searching databases of Springer, Emerald, ProQuest,...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملThe 101st Issue of the Journal and its Editorial Concerns
Investigating User Search Tactic Patterns and System Support in Using Digital Libraries Now, the journal has reached its 101st issue. Founded 29 years ago (1991) by the Iran Public Libraries Foundation, it began publishing under the title of Payam-e Ketabkhane (in English, Message of Librarianship). In 2009, the journal got a scientific promotion from the Ministry of Science, Research and Techn...
متن کامل